A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning
نویسندگان
چکیده
In this paper, we consider the supervised learning task which consists in predicting the normalized rank of a numerical variable. We introduce a novel probabilistic approach to estimate the posterior distribution of the target rank conditionally to the predictors. We turn this learning task into a model selection problem. For that, we define a 2D partitioning family obtained by discretizing numerical variables and grouping categorical ones and we derive an analytical criterion to select the partition with the highest posterior probability. We show how these partitions can be used to build univariate predictors and multivariate ones under a naive Bayes assumption. We also propose a new evaluation criterion for probabilistic rank estimators. Based on the logarithmic score, we show that such criterion presents the advantage to be minored, which is not the case of the logarithmic score computed for probabilistic value estimator. A first set of experimentations on synthetic data shows the good properties of the proposed criterion and of our partitioning approach. A second set of experimentations on real data shows competitive performance of the univariate and selective naive Bayes rank estimators projected on the value range compared to methods submitted to a recent challenge on probabilistic metric regression tasks. Our approach is applicable for all regression problems with categorical or numerical predictors. It is particularly interesting for those with a high number of predictors as it automatically detects the variables which contain predictive information. It builds pertinent predictors of the normalized rank of the numerical target from one or several predictors. As the criteria selection is regularized by the presence of a prior and a posterior term, it does not suffer from overfitting.
منابع مشابه
A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis
Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...
متن کاملSupport vector regression with random output variable and probabilistic constraints
Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...
متن کاملA Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis
Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...
متن کاملRule-based joint fuzzy and probabilistic networks
One of the important challenges in Graphical models is the problem of dealing with the uncertainties in the problem. Among graphical networks, fuzzy cognitive map is only capable of modeling fuzzy uncertainty and the Bayesian network is only capable of modeling probabilistic uncertainty. In many real issues, we are faced with both fuzzy and probabilistic uncertainties. In these cases, the propo...
متن کاملNumerical Meshless Method in Conjunction with Bayesian Theorem for Electrical Tomography of Concrete
Electric potential measurement technique (tomography) was introduced as a nondestructive method to evaluate concrete properties and durability. In this study, numerical meshless method was developed to solve a differential equation which simulates electric potential distribution for concrete with inclusion in two dimensions. Therefore, concrete samples with iron block inclusion in different loc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 8 شماره
صفحات -
تاریخ انتشار 2007